There is increasing interest in the use of diagnostic rules based onmicroarray data. These rules are formed by considering the expression levels ofthousands of genes in tissue samples taken on patients of known classificationwith respect to a number of classes, representing, say, disease status ortreatment strategy. As the final versions of these rules are usually based on asmall subset of the available genes, there is a selection bias that has to becorrected for in the estimation of the associated error rates. We consider theproblem using cross-validation. In particular, we present explicit formulaethat are useful in explaining the layers of validation that have to beperformed in order to avoid improperly cross-validated estimates.
展开▼